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Abstract 



The analytically derived expected asymptotic standard errors (SEs) of ML 
(maximum likelihood) item estimates can be predicted by a mathematical function 
without examinees’ responses to test items. The empirically determined SEs of 
MMLE (marginal maximum likelihood estimation) / Bayesian item estimates can 
be obtained when the same set of items are repeatedly estimated from test data. 

Understanding the consistency of SEs yielded from both approaches is of 
pnmary concern for the applications of the analytic SEs. In most cases, the SEs 
yielded from both approaches were very similar, especially for the generalized 
partial credit model (Muraki, 1992). This finding encourages test practitioners and 
researchers to apply the asymptotic SEs of item estimates to the following 
applications: (a) practical testing situations, (b) item-linking studies, and (c) 
predicting the SEs of equating scores for the IRT true-score procedure (Lord, 

1982) without examinees’ responses to test items. 

Three-D graphical presentation for the analytic SEs of item estimates has 
also been provided for better understanding several widely-used IRT models. 

Key Index: As3miptotic Standard Errors; Item Parameter Estimates, 

Item Response Theory (IRT) 






Applications of the Analytically Derived Asymptotic Standard Errors 
of IRT Item Parameter Estimates 

I. Introduction 

A. Background and Motivation 

Tl. has been widely employed in large-scale testina nroerams 

degr:ro7ptci:r„?:?r -he ■ 

es .mates. Font primary factors are often highlighted tn the l.temlnre. The fts on Led to 

(Ackerman, 1992), e.g., local independence. The second one is related to the estimation method 
maximum likelihood estimation (JMLE) that may not converge to the true 

992). Also, the marg.nal maximum l.kelihood estimation (MMLE) may not p oduce toe values 

S o'nSS TheLtd o ofVaml eL'm , ot 

th. d?®’ ■ r ^ ^ associated with model mis-fit (Tam & Li, 1997). For instance if 

unidime^"^’°"f response matrices from a test is multidimensional rather than 

mol^TXd LLhLLe it ““><* “““■ »h=n a nnid.mens.onal IRT 

moaei is applied to these item response matnces (Ackerman 1992- Rerka^p i tu r 

actor ,s related to practical limitations. For instance, the s^ple sL used feV pt^^ete^^ 

etrS 7 are not heterogeneous so that standard 

errors (SEs) of Item estimates for too hard or too easy items become large (Stocking 19901 

estimate ThTs?"'-/' an index of the precision of an Z 

i7.h;f dZLl^L -aightfo^ardly calculated due to the prohTeLls Jd a^ove. 

f«r t ’ ru selected fits test data, the maximum likelihood (ML) estimate is chosen 

c71 7 m c7'°7- abthty disnibution is known, L anL^ppr^^h to 

pu e a SEs for any set of item parameters and sample size exists (Thissen & Wainer 19S21 
r^7t?“'i:"Tf‘’‘’T Sach mathematicarexpressil 

diL^LoL IR^rd r'T‘* “"dimensionally 

(kf 7iLTz & yL.7 ”°7h=“ “dimensionally polytomous mT models 

pniioT 'Si n ’ ^ multidlmensionally dichotomous IRT models (Li & Lissitz 

2000). men all assumptions used to yield SEs of item estimates are not completelv to^Tl 

aWs's'en & Wame°r 

The analytically expected asymptotic SEs (called AEA-SE) of item estimates can he 

us^ for detecting the potential problems of the application of ML estimation to ariRT ' 

model without real test data. For instance, Thissen Ltd Wainer (1982Wd Xle-D ^iieal 
representation to explore the relationship between the expected SEs of item difliculties^d the 

(T"Ter.o Lr itslfTr'^^^ .de“«er “ o^ 

magLitudes 7f T fs of ,7 Mt gi J 7 ” ^"“iy Nearly showed that the 

agnitudes of SEs of the ML difficulty estimates were unacceptably large excent when the 

sample sizes are enormously large (e.g., more than 100,000 ex^inels). IhisTnltsuggested 






“ appropnate candidate to be chosen for estimating Three-PL item parameters 
when the lower asymptote parameter is poorly estimated (Gruijter, 1984). Similarly^his 
application can be extended to explore other IRT models, such as the generalized n^itial credit 

^°th^ 'Widely employed in current testing programs such 

as the National Assessment of Educational Progress (NAEP, Beaton & Zwick,^l 992^ ’ 

When researchers or test practitioners are interested in a set of item parameters found in 
literature, in which the corresponding SEs of item estimates were not reported thetiaMr 
approach provides them a sense of how large standard errors of the MUtem estimates might be 
under specific situations. For example, when Li and Lissitz (2000) evaluated how sersi^fthe 
toee multidimensional IRT (MIRT) item-linking methods (developed in their stud^were to^e 
accuracy of item parameter estimates, the analytic approach was used for modeling random 

f .k' (^°™24b, see Reckase, 1985). The cumentTppTcation 

randn 7 study of investigating which item-linking methods can tolerate^the 

random (or sampling) errors of item estimates better. 

ct H • ^ be exercised on the AEA approach to item linking 

studies. For instance, it could happen that when the Three-PL item linking was conducted fonger 

Th "" r estimates of item linking coefficients than shorter tests (Li, Lissitz & 
ng, 1999). The reasons for this will be discussed later. This unexpected result highlighted the 
.ssu^of how ,0 appropnately employ .he AEA-SEs of item estimate" in the context of relea^ch 

The MMLE/Bayesian estimation, incorporated with the additional information of the 
priors of Item estimates, was often employed in the estimation process, such as with the IRT 

S iXTrTf PARSCALE ( Muraki & 

of mm™ polytomously scored items. The empirical SEs and BIASs 

of MMLE^ayesian item estimates (called EMB-SEs) can be obtained by the replication 

leases SmB Methodology. In general, as the number of replications 

increases, the EMB-SE estimate becomes relatively stable and accurate. 

attpm AEA-SE is much less tedious to obtain than the replication approach. When we 
attempt to apply it, the issue of how consistent the AEA-SEs and EMB-SEs are under similar 
conditions is cntical to test practitioners. 

B. Research Purposes 

As indicated, without real test data the three-D graphical presentation for AEA-SEs of 

; “eTp^ m re^ ^ f w estim^it meth^^^ 

LhpI th w ^ . (Thissen & Warner, 1982). As the GPCM has been increasingly used to 
GPGM^^ polytomously scored item responses, the extension of this graphical procedL to the 
GPCM niodel will provide test practitioners better understanding of this model 

rpnu consistency of SEs yielded from the analytic approach and from the 

estimatr’J successfully applying the asymptotic SEs of item 

estimates to practical testing situations and in the context of research studies. This issue will be 
pursued in this study. We expect the results generated from this study to provide va uab e 

c“r EMB-SEsLtL se“Le for 

descriptions on how to employ the AEA-SEs on the following 
applications will be included: (a) practical testing situations, (b) item-linking studies, and (c) 
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predicting the SEs of equating scores for the IRT-true score procedure (Lord, 1982) without 
examinees’ responses to test items. u, i wimout 



”■ Expected SEs of Item Estimates 

H f of a test with mixed item formats, e g tests 

used for NAEP These may consist of multiple-choice and short-response items (dicholomously 

“s^pitae that often occurwhen 

TRT collected from mixed-format items, simultaneously fitting different 

IRT models to different types of items of the same scale (as described by Thissen 1993) has 
been employed in several large-scale testing programs (e.g., NAEP). When modeling this sort of 

lenera "h handling the item responses 
generated by the short-response dichotomous items. The Three-PL is used for the multiple- 

choice dichotomous item responses and the GPCM for the constructed-response (or assay) 

Srr 'hree models were explored with 

test data from a mixed-format test with dichotomously and polytomously scored items. 

1. Three-PL and Two- PL Logistic Models 

The commonly-used Three-PL logistic IRT model was used to model the dichotomous 
scored Items in this study. Under the Three-PL model, the probability, P^, of a correct response to 
an Item i for an examinee j with ability 0j. is given by (Lord, 1980): 

exp(Da.(0. -b.) 

Pji(0j)=c. +(l-c.) LJ L_ 

^ ^ ^ i^l + exp(Da.(0. -b.) (1) 

K j r 

where 

Ms'r?°em 2criLS.*'' "" “f ‘“Sarithm exponential, 

bj is the item difficulty, 

^ IS the lower asymptote parameter (also known as the guessing parameter) and 
D IS a scaling factor (usually equal to 1.702). 

ifpmc • model is attained if the guessing parameter C; is constrained to zero for all 

Items in Equation 1 above. 



2. The Ge neralized Partial Credit Model 

k on itcrnff ™<iel (^uraki, 1992), the probability. of the categorical responae 
k on Item i for an individual i with ability 0 is given by the familiar logistic function; 



exp 



Pjik(0j) ~ 






V=I 



Zz,.(e ) 

V=1 J 



2] exp 

C = 1 

^ik(Qj) ~ D^(0j -bjij) = Daj(0j -bj +d|J 



( 2 . 1 ) 

( 2 . 2 ) 
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where 

aj is a slope parameter (or item discrimination), 
bj is an item-location parameter (or item difficulty), 

hi, is an item-category difficulty parameter, where b:, equals to b -d, and 
d, is a step difficulty. . k u 

nht^nnpj ^ restricted form of the GPCM model, 
oMamed by further constraining the item discnmination index to be identical for all items 

When ah items have only two categories to be chosen, the GPCM becomes a Two-PL model 
Only mpl item-category parameters can be identified when the number of response 
categones IS m. The item-category difficulty, b, (or step difficulty, d.) of the first category on 
each Item is arbitranly set to zero and the location constraint of 

(2.3) 



k=2 



is imposed to eliminate indeterminacy (Muraki, 1992). 

Figure 1 is GPCM item categories probability curves when a=l, bj 2 =. 5 , bi 3 =l and b =1.5. 

This plot shows that the item-category parameters are the points on the 0 scale at which the item- 

category plot of Pj ,.,(0) and Pj,(0) intersects. For instance, the first and second category curves 

intersect at 0.5 on the 0-scale, at which the value of .5 represents the second-category parameter 

(bi 2 ). Ihe values of item-category are interpreted as the relative difficulty of category k in 

companng other categories within an item. The item-category parameters equal a constant (the 

item loca ion parameter for this item) minus the corresponding step parameters that are not 

necessanly ordered sequentially within an item (Muraki, 1992). This also technically implies 

that no constraint for order of the item-category ( 1 , 2, 3, ... k) parameters is required. 

hen we expect examinees taking a test to find it easier to reach a lower level k-1 in an 

Item than to reach a higher level k , the frequency of examinees, Fj„ reaching a lower level k-1 

IT * ’ ^ frequency reaching a higher level k in an item (refer to 

Masters, 1 982). 

Because the frequency, Fj„ is a sufficient statistic for estimating under the PCM model a^ ^ 
STdeL^ f°r the item-category parameters must always 

bu^b,...<b. 

For the GPCM, ordered item-category parameters, b;,, are preferred, but are not required. The 
order of the item-category values may produce lower SEs of item estimates as demonstrated by 
the three-dimensional graphs later. ^ 

B. Principles for Calculating the Asymptotic SEs Without Test Data 

As indicated previously, a variety of legitimate factors can cause errors in the parameter 
estimates. Tins section focuses on those factors that are associated with estimating asymptotic 
SEs m the ML parameter estimates (refer to Hambleton, Swaminathan & Rogers 1993- 

Stocking, 1990; Thissen & Wainer, 1982). ^ ’ 

The method of sampling subjects can substantially affect the magnitude of the SE in the 
estimation of a parameter and can not be ignored due to the "sample free feature" in IRT. As a 
matter o for estimating item difficulty, easy items and hard items are not well estimated if 
ability IS a bell-shaped distnbution centered around the mean ability level. For estimating the 

4 
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guessing parameter, only low-ability groups are informative. When there are few low-abilitv 
examinees available to estimate the guessing parameter of a relatively easy item such a ^ 
condition makes Its SE very large. In addition, the large covariance betweL th^ guessing 
estimate and the dtfflculty (location ) esttmate then "causes this uncertainty to mo™ pTtlallv to 
the estimate of location" (Thissen & Wainer, 1982, p 403). For estimating item discri^nati^n 

990). Such a condition may be found m a broad distribution of abilities relative to the 

ftl“etr?s~'- -'*> SEs of 

value parameter itself may have an effect on its SE 

and In ^ard Items and easy items have larger standard errors; as do the high 

^d low discnmination items. When the distribution of abilities is bell-shaped the SE of an ifem 
1 iculty associated with a high discrimination parameter, is lower than the same item difficulty 

r9827Th„r J ^ parameter (see Figures 2, 3 and 4, Thissen & Wainer 

1982). Thus, the combination of a set of item parameters for an item should be taken into account 
when modeling the SE m the estimation of parameter estimates. 

The sample size is also a substantive factor affecting the SEs of parameter estimates The 
larger the sample size; the lower the SE. cMimaies. i ne 

di.trih.urn^^^^r'’ u size, the shape of the examinees’ ability 

istnbution and the charactenstic of test items can each cause differences in the errors in the 

parmeter estimates A mathematical expression for this relationship is given in Appendix A for 
dichotomous and polytomous item response data. ^ 

III. Methodology 

A. Three-D Graphical Presentation to the Analytically Asymptotic SEs 

As indicated, the combination of item characteristics (e.g., difficulty discrimination and 

r teee n f" each^oVroTs “1 ' 

estimates^ The toee-D graphical presentation will be used to illustrate this issue. 

A A- ® functions presented in Thissen and Wainer’s study (1982) and in 

fiiSn of the mT AEA-SEs of a set of ML item estimates for an item are a 

Straifd^^^^^iH^ examinees’ abilities. Here, the 

100? Soo or .nnn^ how large (e.g., 

s^?ie Te’tn th ’■^^°''^hle for item parameter estimation? The sample size ratio (SSR, 
sample size to the number of item parameters, refer to De Ayala & Sava-Bolesta 1999) can be a 
more objective index to resolve this issue than sample size alone and was employed L tlSrstudy 
instance, results from De Ayala and Sava-Bolesta (1999) indicated that the SSR of 10- 1 can 
yield reasonably accmate parameter estimates for the nominal response model (Bock, 1970) 
study abilities are normally distributed. This standard (SSR=10:1) is used in this 

1 9 nn SSR equals 10:1 and a 40-item test is constructed, the sample sizes of 800 and 

necessary tor the 40 four-category scored GPCM items. 

B. Comparisons between the AEA-SEs and the EMB-SEs 

_1. Test Data Generatinn 
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The simulated item parameters were from the Algebra Assessment, designed and scored 
by the Educational Testing Service (1998, Algebra End-of-course Examination Report). This 
test consists of 24 multiple-choice items, 8 short-response dichotomous-scored items and 10 
constructed response items (3 three-category items, 3 four-category items and 4 five-category 
items). 

The simulated test data were generated by the computer program, RESGEN2.1 (Muraki, 
1997). The computer program, PARSCALE (Muraki & Bock, 1996) was used for item 
calibration. The computer program, EQUMIXED (Li, 1999), was used for computing the item 
linking coefficients for a mixed-format test. These linking coefficients were then used to convert 
the scale of the parameter estimates to the scale defined by the true parameters. The sample was 
set to 1290 to meet the requirement of SSR=10:1. The number of replication was set to 50. 

2. Calculating the Empirical MMLE/Bavesian SEs 

The EMB-SE was obtained by using the following steps. 

(1) . Generate a test dataset by the procedures indicated previously; 

(2) . Simultaneously fit the Two-PL, Three-PL and GPCM models to appropriate item responses 

and calibrate item parameter estimates, using the MMLE/Bayesian estimation method; 

(3) . Transform the metric of the estimated parameters to the one defined by the true parameters; 

(4) . Repeat steps 1 through 3 a large number of times, which results in a large number of 
estimates for each individual parameter, and 

(5) . Calculate the BIAS and RMSE (root mean squared error) for each of the parameter 
estimates by the formulas shown below. 



BIAS(Hi) 



RMSE(Hi) 






r 



Vh,)^ 

and 

r 




( 5 ) 

( 6 ) 



where Hj is the true item parameter, Hj is the corresponding estimated item parameter, and r is 
the number of replications, in which r equals 50 in this study. 

RMSE is a measure of total error of estimation that consist of the systematic error (BIAS) 
and random error (SE). These three indexes are related to each other as follows: 



RMSE(Hi)' s SE(Hj)' +BIAS(Hi)' (7) 

The empirical MMLE/Bayesian SE of an item estimate is approximately estimated by. 
SE(Hi)s VRMSE(Hi)' -BIAS(Hi)' (8) 



Another method to estimate the SE of an item estimate is to directly compute the standard 
deviation of the 50 item estimates obtained from Step 4. This method was not adopted in this 
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study because the SE index as well as BIAS and RMSE indices are all important indices to be 
used for evaluating the measurement errors of item estimates. 

3. Calculating the AEA-SEs 

The same 42 sets of item estimates were also used to generate the AEA-SEs of item 
estimates. The estimated posterior distribution of abilities reported from the PARSCALE output 
during the item calibration process was used to define the latent distribution of abilities. The 
same sample size, 1290, used to generate item response data, was used here. 

4. Data Analysis 

Descriptive statistics of the SE Index of item parameter estimates for AEA and EMB 
were calculated. Because the same set of item parameters was repeatedly used for estimating 
their SEs across research conditions, the Log[SE] ( Harwell, Stone, Hsu & Kirisci, 1996) of each 
of various item parameter estimates was treated as a repeated- measure across research 
conditions. A t-test for dependent observations was then performed to compare the impact of the 
estimation method on the precision of SE estimates for item parameters. 

The Pearson correlation coefficient between AEA-SE and EMB-SE measures, across test 
items, was calculated for each of various item estimates. Similar calculations were performed for 
the correlations between BIAS and AEA-SE, across test items, as well as BIAS with EMB-SE. 
The plots of SEs of item estimates as a function of true item parameters for the AEA and EMB 
were graphed. 

IV. Results 

A. Using 3-D Presentation to Explore the AEA-SEs of Item Estimates 

1. Three-PL Logistic Model 

For the Three-PL model, figures 2a and 2b present plots of SEs of item discrimination 
and difficulty as the bivariate function of both item estimates while the guessing parameter was 
set to a constant of 0.25. Figure 2a is for the SEs of bs. It indicates that easy items are more 
likely to have more measurement error as highlighted by Thissen and Wainer (1982). The SEs of 
the difficulty parameters can reach an unacceptable magnitude, for instance, larger than 2 for an 
item with a set of parameters (a =1 .3, b = 0 and c =.25). 

Figure 2b turns its focus on the SEs of a-parameters. This plot raises an interesting issue 
that the SE of an a-parameter becomes very high when the same item’s b-parameter value is 
extreme (e.g., too hard or too easy). Fortunately, this combination of item parameters (high 
discrimination with very high or very low difficulty ) does not usually occur in real testing 
situations because the likelihood of producing a high discrimination parameter seems to be rare 
for a too hard or too easy item. 

Figure 2c presents SEs of the guessing parameters as the bivariate function of the 
guessing and difficulty estimates, when the a-parameter was set to a constant of 1.5. This plot 
clearly indicates that the problem of estimating the guessing parameters occurs when an item is 
relatively easy. 

2. Two-PL Logistic Model 

For the Two-PL model, figures 3a and 3b present plots of the SEs of item estimates as the 
bivariate function of both difficulty estimates and discrimination estimates. Figure 3a is for the 
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of m/ Three-PL model) shows that the problem 

* ficulty estimates for the Two-PL was not as serious as for the Three-PL In 

AeTali of SEfo When 

L ' b-parameter reaches .7, it suggests that this b-parameter’s 

ombmation with the a-parameter is one of unrealistic combinations of a set of item estimates 
(e.g. hard item with high discrimination parameters). 

Thr^^ f connotes a special meaning for the Three-PL model. When the 

Three-PL c-p^ameter is perfectly estimated, the c-parameter estimate no longer affects the 
estimates of b-parameter and a-parameter. Under this circumstance, the ThretpL model In be 
analogous to the Two-PL model so that the maximum possible value of the SE for the fa- 
parameter IS expected to be about 0.7, under SSR = 10:1. Therefore, when a SE of an item’s 

perspective, there are three possible reasons- 
of t- estimated, (b), a worse or unrealistic combination of a 

set of Item estimates (a and b) for this item occurred, and (c) both factors combined 

r * a-parameters. This plot is very similar to the Figure 2b 

tfafTwo PT ^ H magnitudes of SEs are relativefy smaller in 

the Two-PL model than the Three-PL model for the high-discrimination hard items 

3. Generalized Partial Credit MnHpl 

Figure 4a is graphed to explore the relationship between the SEs of the categorv- 

h category-parameter (b,) and the discrimination 

parameter, when the category-parameters h,, and were set to constants of -1 and 0. On the spot 

<h ^ c? ° f category-parameters are ideally ordered (b , 

bi 3 bj, the SEs of bj^ become relatively small; otherwise, the SEs of bj^ could become 
enormously large when the category-parameters are not ideally ordered, e.g., b. > b , and b- 
bi4. Comping Figure 4a with Figure 4b where the category-parameters bj 3 and b , were set to 

f °TthTth ^ estimates (Figure 4b) become relatively small The 

fact that their item-category estimates are ordered might be one of the causes. 

respect to the SEs of the discrimination estimates. Figure 4c was plotted with the 
as those used for Figure 4b. This figure shows that the higher values of the 
discnmmation parameter, the higher SE produced, especially with the condition of 
extreme bj 2 values. This phenomenon was also found in the Three-PL and Two-PL models. 

rprx/i ^ u® plots of SEs of item estimates for the Three-PL, Two-PL and 

models imply that models without the guessing parameter are more likely to have item 
parameters that are more precisely estimated. 

B. Comparisons Between the AEA-SEs and the EMB-SEs 

1. The T hree-Parameter Logi.sHc Model 

Figure 5 presents plots of the SE as a function of true parameters for the Three-PL item 

EMB mTthod^ (b) and the guessing parameter (c), under the AEA and 

SF rnm H ^cc-PL section on Table 1 shows summary descriptive statistics for the 
ob, computed across 24 items, for each method. 

Figure 5 shows that the results from the EMB and AEA methods were similar except for an 
extreme case or a set of item estimates for an item. Originally, this set of item parametorf 

r T Ki o! C-.318) were calibrated from real test data with sample size equal to 6426 
(see Table 2). Intuitively, the SE of b-parameter from the BILOG output is relatively large, 581 
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SSR=10:l, where th^^tors of^2 r. '”8=^ ‘han .70 under N=1290 or 

item estimates (parametets b and at for an^!^* estimate and a worst combination of a set of 

Figure 2a indicates thts combirata 1 2 and b - mris' o^ *“ ^ 

‘Sifset'^oTr «nrb'r 

N-t oom .u , ■ '.™ "'“'■“ repeatedly esttmated 50 times under SSR=10: 1 (or 

N- 290), the repltcatton-based SE (.743). BIAS (.782) and RMSE (7^ 43)^ ^(.782) ^=1.079t 

im;iy'toXn"n“™^^^^ 

b-parameter estimate is expected Whf*n «,a fi.rtUe. large, larger BIAS value for this 

absolute value of BIAS for «>e 

were very high, e.g., with AEA-SE = .97 a^d wifh Ehffi SE =“l9 

was found for the c-parameter Since the item * Table 3). Similar result 

these results suggest thatThtn ftoLr SF o^i^ T interrelated, 

found, large Blfs foX-^S be e " 

type of item can not be used for testing situatioL due t? ^ tl' ®^‘'nta'as for this 

The Three-PL section of Tabte oTl'f ““ ‘"““'“"'p 

across the 24 items from the AE A methnH i • estimates (a, b and c), 

reported for the EMB method. The dependeiuTstatifdc Jt^^ corresponding indices 

parameter showed statistically simiific!nt ?ff I ® ^ ^^S[SE] of the a- 

It seems that there if no f T"" '^e EMB methods, 

difference (.01) is minimal. No significam fikreVcet wS difference since this 

The cotrelation coefficient be^een AEA-SE ^Temb ?e II P"™"'® i’ “ti n. 
parameters, were .90, .89 and .91 for the parameters, a, b and c. 

SEs of iterestoto eTcSal l^A^m'igh! ^ ^‘tniinr 

Of Tbree-PE item par^^Z; ^l^nSrEtr ^ lb »dt 
a close examination for those sets nf item rtarar« t ^ 2c. If this exception occurs, 

corresponding BIAS and RMSE values ndght also b“dy faTgf 



2. The Tw o-PL Logistic lUnHei 

discrimLr tS'r liL^lreTbl “ir P-^"- f" T»»-PL ifetn 

section of TablVl’ shows ^wo-PL 

each method. lor SE, computed across 8 items, for 

section ISle “toSemS seT 

slightly lower than those pm^ced She and b from the AEA method was 

found. The correlation c^SS h“XsE: spf '> 

for both the parameter a and b estimates. EMB-SEs, across 8 ttems, were .97 
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3, The Generalized Partial Credit Model 

Figure 7 presents plots of the SE as a function of true parameters for the GPCM item 
discnmination (a), and the item-category difficulties (b under the AEA and EMB methods. 

The GPCM section of Table 1 shows summary descriptive statistics, computed across all GPCM 
Items, for the SE for each method. 

Figure 7 shows that the results from EMB and AEA methods were similar. Table 1 shows 
that average SE of parameters a and b from the AEA method was slightly lower than those 
produced from using EMB. Significant differences were found (Table 1), except for the item - 
categories, bj 4 and bjj. The correlation coefficients between AEA-SEs and EMB-SEs, across test 
Items, were .97, .93, .94, .99 and .99 for a-parameter, and item-category parameters,’ b ,, b ,, b . 
and bj 5 , respectively. ' ‘ 

Two-PL is a special case of the GPCM model. In general, AEA consistently produced 
lower SE values for each of the various item estimates than EMB for these two models. Table 1 
shows no significant differences in SEs for the Two-PL model were found, but several 
significant differences occurred for the GPCM model. 



As indicated in Equations 3 and 4, when a GPCM item has the characteristic: easier 
success at a lower level than at higher level for examinees, the item-category difficulties (or 
frequencies of examinees reaching any category within an item), generally, will be ordered. The 
AEA Three-D graphs presented previously imply that an item with this characteristic could be 
one of the causes of stabilized item-category estimates. The values of EMB-SEs of item- 
category parameters from some items also support this point. For instance, the set of item- 
category parameters for a GPCM item used in this study were bj2=2.83, -0.69, b;4=6.40 and 

bi 5 — 2.32. These item-category parameters were not ordered. Their corresponding EMB-SEs (or 
AEA-SEs) were relatively larger, 0.26 (0.22), 0.20 (0.18), 0.64 (0.64) and 0.62 (0 63) (note- 
values in parentheses were AEA-SEs). 

^ In another example in which a set of item-category parameters were ordered (bj2=-0.41 , 
bi 3 = 1.76, bi4=2.41 and bj5=2.91), their corresponding EMB-SEs (or AEA-SEs) were rel'ativelv ’ 
smaller, 0.07 (0.05), 0.11 (0.10), 0.21 (0.17) and 0.39 (0.33). 

Considering the Two-PL as a member of the GPCM model, the AEA and EMB, in 
general, produce very similar SEs of item estimates for the test data evaluated in this study. 



V. Applications of Conclusions 

A. General Applications 

For the practical application, Thissen and Wainer (1982) illustrated how to use the AEA- 
SE for the determination of the sample size required to yield desired accuracy for any set of 
item par^eters. As demonstrated in Figures, 2, 3 and 4, this type of application should be more 
meaningful and practical when any unreasonable combination of a set of item estimates for an 
item (e.g., hard item with high or low discrimination parameter) is excluded. As a matter of fact, 
this unrealistic combination of item estimates is rarely found in real data. If it occurs, the level of 
precision for these item estimates would be questionable. 

Tabulating the AEA-SEs of item estimates under some conditions (e.g., different sample 
sizes, levels of item difficulty or discrimination, etc.) is another means to provide test 
practitioners a sense of the accuracy of parameter estimates, on which SEs of item estimates are 
yielded under a specific situation (refer to Thissen & Wainer , 1982). 
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The three-D graphical presentation for AEA-SEs of item estimates can be used for 
detecting the possible problems of the ML estimation method to the IRT models of interest 
Using this graphical procedure, Thissen and Wainer (1982), along with this study, have pointed 
Fn? 1 1 ' 7*dely-used Three-PL has potential problem obtaining accurate ML hem estimates. 
Fortunately, this is not the case for the current popular estimation method, MMLE/Bayesian that 
minimize this problem for the Three-PL model. ^ 

three-D ^aphical analyses for the GPCM suggests that when constructing the 
G CM test Items easier success at a lower level in an item than at a higher level for examinees 
needs to be considered . This pnnciple, in general, makes the item-category difficulties ordered 
and that may decrease the variation of item-category estimates. 

AEA tends to produce relatively larger SEs for some combinations of Three-PL item 
parameters (^_g ^58, b=.l 13 and c=.318; also refer to figure 2) than the replication-based 
approach or BILOG. Although this fact is AEA’s drawback, we might make use of this fact to 
questionable Three-PL item parameter estimates when we have trouble deciding 
whether the numencal SE values generated from the BILOG-output are large enough to be 
Identified as unstable and inaccurate item estimates. More specifically, when we find that some 
sets of item estimates, along with large SEs, generated from an IRT computer software, we might 
also calculate the corresponding AEA-SEs for these sets of item estimates. If the magnitudes of 
^A-SEs of b-parameters are large (e.g., larger than .7 under SSR=10:1), we might suspect that 
the set of item estiniates are unstable and inaccurate and can not be used in real testing situations 

1 AC A examined in this study, we also find that Three-PL item estimates with 

larger AEA-SEs also have larger BIAS values. This finding suggests that the AEA might be a 
nice tool to identify which sets of item estimates are contaminated with large BIAS error. The 

issue 0 how well the AEA approach can be used to flag biased item estimates for the Three-PL 
model needs to be closely examined. 

B. Application to Item-Linking Studies 

Several unidimentional IRT equating methods for placing test items, separately calibrated 

u" ^ toninton-item linking design (Vale 
1 986) in which tests containing a set of common items are administered to two groups of 

ex^mees. They can be grouped as, the mean/sigma method (Marco, 1977), the mean/mean 
inethod (Loyd & Hoover, 1980), the item characteristic curve method (Haebara 1980) the test 
charactenstic curve method ( Stocking & Lord, 1983), the minimum chi-square method,’ Divgi, 
1985), and the numencal integration method (Zeng & Kolen, 1994). If we attempt to explore 
which method is the most robust to the random error of item estimates for the polytotomus- 
scored test. This type of research is very time-consuming when random error of item estimates is 
anipulated by the replication approach described in the previous section on Methodology In 
contrast, if the random error of item estimates is manipulated by the analytic-based approach 
researchers can take advantage of its significant features described previously. 

Conceptually, an item estimate has three components, true item parameter value a 
r^dom error and bias. When bias is assumed to be zero and a set of true item paramete’rs for an 
1 em IS given, the procedures of adding “reasonable numerical values as random errors” to this 
set of true parameters to form a set of item estimates are illustrated below. 

When the latent trait distribution of 1000 examinees’ abilities is distributed as N(0 1) the 
vanance-covariance matrix, V shown below, for a set of item parameters, a=l .2 b=0.5 and 
can be predicted using Equation 13 : 
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v = 



a 

■.0239 

.0058 

.0034 



b 

.0058 

.0067 

.0021 



c 

.0034 

.0021 

.0012 



».,.zzrji?^srst 

estimates, a, b and c, are the diagonal elements of matrix E.’ 

^ be 

‘.-.0901 .0051 -.0134' 

E= .0051. -.0302 .0222 

-.0134 .0222 .0493 

It IS noted that matnx E is randomly generated from the MVNrO .n thf,t th. „oi 
its elements vary across reDlicatinnQ • i j • \ j / so that the values of 

parameters a=l^ bS 5 a^d c=0 ^ 'he set of true 

E. Theoretically, when a^te num^e of rf , '^e error matrix 

the simulated item estimates, a, b and c wilfb^l^ore t ^ conducted, the standard deviations of 
c. They are .155 082 and 034* Thi<5 tvnp f a y ° expected SEs of parameters a, b and 

much easier to employ for some item errors of item estimates is 

and L,ss,.z (200;7llfwre7sSS^^^^^ -"f -d by Li 

attempt to examine which MTRT itpm i.mi.- *u item-hnkrng methods exrsts, we 

(or saS,pling “S„ relatively less sens, live to the random 

1 . Choose a set of true rtem parameters for the base test . 

linkedTst C^'n^SSrcoe^^^ Parameters for the 

set of iTaIn™ r7f™iteiJfL7i^pute^^^ parameters. Each simulated item estimate from a 

4. Estimate the equating coefficients based on two sets Ibase and linlrpd^ nf \ 
estrmates. linked) of rtem parameter 

LiSrTa^h i«at 
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So^ffidem" :Sa,e" '*>^ “^"- 

the nrin^?e^nf m example conducted by Li, Lissitz and Yang (1999) examined whether 
the pnnciple of matching test charactenstic surfaces between the base and linked tests is 

appropnate for mixed format tests for finding the linking coefficients. Since the AEA had been 
empoye or modeling errors of item estimates, an enormous number of replications (1000) for 
each research conditions became available. Therefore, the sampling distribution for each 

accurately estimated as the number of replications increases 
to 1000. However, a unexpected result indicated in the introduction can occur. Table 4, excerpted 
from tha study, presents the average value of the Three-PL item parameters for the set of 10 15 
^ the corresponding average AEA-SE. The fact that the longer test (15 or 20 

Items) had larger average ^A-SE than the shorter (10 items) test caused this unexpected result 
to occur Consequently, when using the AEA approach to model random errors of item estimates 
interpreting the research results should be done cautiously. 

Considering the MMLE/Bayesian as a standard method for item estimates, the analytic 
approach to compute the SEs of ML item parameter estimates, in general, underestimates the 
vanation of item estimates for the GPCM model. For the Three-PL model, the analytic approach 
tends to overestimate vanation of item estimates. Although those differences can be minor in 

t^ Th^eT estimates, some extreme differences can occur, especially for 
tt on results from this study, the extreme cases most often came form 

those with unreasonable combinations of sets of item estimates. Those cases will produce larger 
measurement errors and should be excluded from the studies. ^ 

Using AEA for modeling random errors of item estimates has its theoretical limitations 
As indicated measurement errors of item estimates are assumed to be distributed as MVN (S 
V). AEA IS the analytic approach to model the “units of measurement errors for item estimare’s” 
(known as SEs of item estimates, associated with the matrix V). Besides that, modeling the 
points ot ongm of measurement errors for item estimates” (known as the BIAS of item 
estimates, indicated by the vector S) is another key issue to be considered. Although we might 
assume S_ to be 0, for simplicity, ML is a biased estimator (Anderson & Richardson, 1979) and 
the degree of bi^ depends upon the sample size. This issue of modeling S needs to be further 
explored in the future for better prediction of measurement errors of item estimates Up to this 
point wheri the bias of item estimates may have a strong effect on the research topics being 
investigated, the AEA approach to modeling measurement error is not appropriate. 

C. Application on Estimating Standard Errors for IRT-true Score Equating Scores 

As indicated without using examinees’ responses on the test, the expected SEs of item 
estimates c^ be predicted Similarly, the expected SEs of IRT-true score equating scores (Lord, 
1982) CM also be predicted without using examinees’ response patterns on the test In other 
words. If a new test is generated from an item pool and edited with some anchor items used in the 
previous (old) test form, a look-up table for converting the new test scores into the 
corresponding old-test scores can be generated. After that, the SE for each of the new test scores 

the information of the AEA-based var-cov matrix of item estimates (see 
Equation 13 ) is incorporated with Lord’s formula (1982). It is noted that Lord used the observed 
T h"" estimates, generated from a real test data, rather than the analytic one. The 

Lord!fsmdyl?9^^^^^^ of IRT-true score equating can be found in 
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with thoK^roduced b^th. formula-based SEs of test scores are 

equating Tsaf 19981 tI, hi ebained true score 

X „f St , ‘f bootstrap approach (Kolen & Brennan, 1995) was used to comnute 

range w^ 0 tCgh l^Ofolr^ft ““ design. The possible Lre 

examinees bad raw scores in fte rSgeff 67 touuh'llrF of tbe 

of SEs of scores calculanl hv 11^5 I ‘T“7 '^‘8“''® ® »”'y P^sents tbe similarity 

ocb oi scores calculated by both approaches for this score range. Since this is a liren^nrc t«t 

£F°r r f “ Of ’ 

.attb^r^L^ 

Finally, the SEs yielded from the AEA and the EMB approaches were very similar in 
most cases, especially for the GPCM model or models without the lower asympSte parameter 
This finding indeed encourages test practitioners and researchers to apply thrasympfotic SEs 

™ •“ in <be ” Of 
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Appendix A 

1. For the Dichotomous IRT Models 

For an item i, the likelihood of the observed dichotomous responses for N independent 
examinees is (refer to Thissen & Wainer, 1982): ^ 



1-U 



Ld =!!•’;(> -P i) 

H 

where P can be calculated from a three-parameter model, u=l for correct response; u=0 for 
incorrect response. The loglikelihood of Equation 9 is 



( 9 ) 



logLo = ^ [u log(Pj ) + (1 - u) log(l - P. )] 



H (10) 

The maximum likelihood estimates of each parameter (e.g., a^, b;, c, for the Three-PL model ) are 
located where the partial derivatives of Equation 6 are zero. For ease of expression, ^ represents 
^e three-parameter item parameters (^, d, c^. Given a density of 0 (e.g. normal distribution, 

N(0, 1)), for pair of parameters and the negative expected value of the second 
1982) loglikelihood function. Equation, 10, has the form (refer to Thissen, Wainer, 

./^ M 

o.(e)de (11) 



-E 



/ 2 ^ 
5"^logL 



s ^t 



00 

= N / 
— 00 



1 

PQ 



ap(0) ap(0) 

s ^t 



where E is the expectation and Q-l-P. Equation 7 requires the derivatives of P(0) with 

The numerical approximation of the integral in Equation 1 1 can be 
calculated by the Gauss-Hermite quadrature and is presented in Equation 12 

'' V N' 1 > 






q=l 



ap(X) ap( X) 



t J} 



A(XJ 



( 12 ) 



where X IS a quadrature point in the ability dimension, q is the number of quadrature in the 
ability dimension and A(X) is the corresponding weight of the quadrature. The number of 
quadrature pomfr for numerical integration are set to 40 in this study. 

The pardal derivatives of P(X) with its parameters can be resolved using difference 
approximation (Nakamura, 1996) and substituted in Equation 12 to give a 3 x 3 (for the three- 
p^ameter model) information matrix corresponding to the triplet item parameters (a, b, and c). 

e inverse of that information matrix is the asymptotic variance-covariance matrix of the three 
parameters and is given m Equation 13. The square roots of the diagonal elements of the 
vanance-covanance matrix are the asymptotic standard errors of the parameters. 



VarCoVjpL = 



3Pl(^’^) l3PL(^’h) IjpL(a,C) 

3PL(h,a) l3PL(b,b) l3PL(b,c) 

(c,a) l 3 pL(c,b) I 3 pl(c,c) 



_^3PL 



(13) 



2. For the Polytomous IRT Models 

For an item i, the likelihood of the observed polytomous responses for N independent 

cXalTlin.66S IS I 
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(14) 



N m 



111 

'--'nn 

i — 1 



j=l k=l 



where P,, can be calculated from a GPCM model n=i fnrth« * • , 

responses o*er than categoiy k. The loglikeliho^d of Equatn M ^ 



N m - ® 

logLp =Z Z k log(Pjk ) + (1 - U, ) log(l - P. )] 

j=l k=l 



j=l k=l ^ *jk/J 

That is; parameter estimates is given. 

M 



-El 



= I 



d LogP 



= N2] 

q=l 



1 



k=l 



sPkQ 



k J 



GPCM 






^Pk(X)ap,(x ) 

s ^t 



Jj 



•A(XJ 
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asymptotic standard errors of the p^arameters. anance-covanance matnx are the 



VarCov 



GPCM 



I 



GpcM(a.a ) 4^(a,b,) I„,„(a,b,) I„^(a.bJ 

G,cM(b„a ) W„(b„b,) I„p,„(b„b,) W„(b„bJ 

GPc„(b„a) I,,,„(b„b,) 4^„(b„b,) W„(b„b.) 

opcm( 4,a ) IopcM(b 4 ,b,) Io,c„(b,,b 3 ) Iopc„(b.,b.) 



n-1 
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Table 1 



Descriptive Statistics of SE Index of Item 
Correlation Coefficients, for the AEA and 



Parameter Estimates, dependent t Tests, and Pearson 
EMB methods (N=1290, Replications for EMB = 50) 



Method and 
Parameter Mean 


Number 
of Items 


AEA 

Mean Min 


Max 


EMB 

Mean Min 




Three-PL 


a 


1.06 


24 


.20 


. 10 


. 49 


. 19 


. 08 


.39 


b 


1.20 


24 


.24 


. 07 2 


.81 


. 16 


. 07 


. 74 


c 


.23 


24 


. 05 


. 01 


. 40 


. 04 


. 02 


. 1 n 


Two-PL 


a 


.98 


8 


. 08 


. 05 


. 12 


. 09 


. 04 


. 16 


b 

GPCM 


1.59 


8 


. 09 


. 07 


. 14 


. 10 


. 05 


. 18 


a 


. 69 


10 


. 04 


. 02 


. 11 


. 06 


. 03 


. 13 


bi2 1 


. 51 


10 


. 17 


. 05 


. 33 


.22 


. 07 


. 60 




.48 


10 


. 16 


. 07 


.31 


.20 


. 06 


.48 


1 


. 94 


7 


.23 


. 09 


. 64 


.24 


. 09 


. 64 




. 07 


4 


.31 


. 14 


. 63 


. 32 


. 12 


. 62 



* P< .05; ** P< .01; *** P< .001 



t 


r 


2 .52* 


. 90 


0.20 


.89 


0 . 70 


. 91 


- . 87 


. 97 


- .55 


. 97 


-8.42*** 


. 98 


-4 .07*** 


.93 


-3 . 89** 


.96 


-1.23 


. 99 


0.26 


. 99 



Table 2 Measurement Error Components for a set of item parameters 



Parameter 



BILOG-SE AEA-SE AEA-SE EMB-SE 
N=6426 N=6426 N=1290 N=1290 



EMB -BIAS EMB-RMSE 

N=1290 N=1290 



a 

b 

c 



.258 
. 113 
.318 



.039 .078 .172 .170 
.581 1.275 2.807 .743 
.082 .183 .402 .101 



. 190 
.782 
. 114 



.255 
1.079 
. 152 



Table 3 



The Pearson Coefficients beriveen the Absolute value of BIAS Index of Item 
AEA-SE, as well as EMB-SE Indices (N=1290, Replications for EMB = 50) 



Estimate and the 



Three-PL 

a b c 

AEA-SE .57 .97 .88 

EMB-SE .41 .89 .84 



Two-PL 




GPCM 






a 


b 


a 










. 38 


.71 


. 01 


. 85 


. 60 


. 57 


- . 09 


. 19 


. 75 


. 11 


. 93 


. 78 


.49 


.06 
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Table 4 

The Average Value of Item Parameters for the Set of 10 15 or 20 Three PT Tt^ ^ .u 
Corresponding Average SE Valu e (N=2000) ’ ^ or 20 fhree-PL Items and the 

Mem Average Mf*an — ■ 
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Figure 

Figure 



Figure 

Figure 

Figure 

Figure 

Figure 

Figure 



Figure 4c 



Figure Headings 

1. GPCM item categories probability curves 

2a. The stodard en-ors of item difficulties shown as the bivariate tlinction of both item 
plrTetlu 0 2 “"™“''°" "'"en the guessing 

dTfficT'a^d ^hown as the bivariate function of both 

Item difficulty and discnmination estimates for the Three-PL model when the 
guessing parameter is 0.25. ™ 

fi'-'ion of both 

Item difficulty and guessing estimates for the Three-PL model when the 
discnmination parameter is 1 5 

''' ™'d!fficu[^trd »f h°<h item 

difficulty and discnmination estimates for the Two-PL model 

i«m dTlXa°H ?"■ tis the bivariate tlinction of both 

4a. The stimd^d errors of item-category difficulties (b„) shown as the bivariate function 
of both tern-category difficulty (b„) and discrimination estimates tothe CTCM 
4b The Zdl?™ f tt-categoiy difficulties, h„ and b,„ are -1 and 0 

’ ^ Item-category difficulties (b„) shown as the bivariate function 

Ide?’ ‘r";f discrimination estimates for ffie GPCM 

model when the item-category difficulties, b„ and b„, are 3 and 4. 



Figure 5a 
Figure 5b 
Figure 5c. 
Figure 6a. 
Figure 6b. 
Figure 7a. 
Figure 7b. 
Figure 7c. 
Figure 7d. 
Figure 7e. 
Figure 8. 



The standard errors of item discriminations shown as the bivariate function of both 
tern-category difficulty (b,^) and discrimination estimates for the GPCM model 

when the Item-category difficulties, b, and b,, are 3 and 4. 

. of a as a fonction of the true a-parameter for the Three-PL model. 

. of b as a fonction of the true b-parameter for the Three-PL model 
SE of c as a function of the true c-parameter for the Three-PL model 
bE of a as a function of the true a-parameter for the Two-PL model.' 
cc ^ ^ ^notion of the true b-parameter for the Two-PL model 
E of a as a function of the true a-parameter for the GPCM model. 

SE of ^ction of the true item-category parameter, for the GPCM model 
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Figure 1 . GPCM item categories probability curves 
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Figure 2a. The standard errors of item difficulties shown as the bivariate function of both item 
difficulty and discrimination estimates for the Three-PL model when the guessing 
parameter is 0.25. 
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Figure 2b. The standard errors of item discriminations shown as the bivariate function of both 
item difficulty and discrimination estimates for the Three-PL model when the 
guessing parameter is 0.25. 
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Figure 2c. The standard errors of guessing parameters shown as the bivariate function of both 
item difficulty and guessing estimates for the Three-PL model when the 
discrimination parameter is 1.5. 
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Figure 3 a. The 



stmdard errors of item difficulties shown as the bivariate fllnction of both item 
ditticulty and discnmmation estimates for the Two-PL model. 
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Figure 3b. The standard errors of item discriminations shown as the bivariate function of both 
item difficulty and discrimination estimates for the Two-PL model. 
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igure 4a. The standard errors of item-category difficulties (bi^) shown as the bivanate ^^ctmn 
of both tern-category difficulty (bi^) and discrimination estimates for the GPCM 
model when the item-category difficulties, bjj and b^, are -1 and 0. 
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Figure 4b. The standard errors of item-category difficulties (b^) shown as the bivariate function 
of both tem-category difficulty (b^) and discrimination estimates for the GPCM 
model when the item-category difficulties, b^j and b( 4 , are 3 and 4. 
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Figure 4c. The standard errors of item discriminations shown as the bivariate function of both 
tern-category difficulty (b^) and discrimination estimates for the GPCM model 
when the item-category difficulties, and bj 4 , are 3 and 4. 
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Figure 5a. SE of a as a function of the true a-parameter for the Three-PL model. 
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Figure 5b. SE ofb as a function of the true b-parameter for the Three-PL model. 
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Figure 5c. SE of c as a function of the true c-parameter for the Three-PL model. 
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Figure 6a. SE of a as a function of the true a-parameter for the Two-PL model. 
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Figure 6b. SE of b as a function of the true b-parameter for the Two-PL model. 
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Figure 7a. SE of a as a function of the true a-parameter for the GPCM model. 



ERIC 



39 



SE of b. 



1.5 



° EMB 
* AEA 




0 . 5 - 



--^-10 12 

True GPCM Category b.^ Parameter 



Figure 7 b. SE of b;2 as a function of the true item-category parameter, bj2, for the GPCM model. 
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Figure 7d. SE of as a function of the true item-category parameter, for the GPCM model 
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Figure 7e. SE of bjs as a function of the true item-category parameter, for the GPCM model. 



O 

ERIC 



43 



standard Error 



2 



1.5 



1 



0.5 



0 



0 


1 




1 


1 


1 ■ 1 1 • 

° Analytic 
* Bootstrap 








o 

o 

c 




o 

o 

o 














o 

*u, O 

** o 

** Or 

** c 


***u,°Oonr 
***^°^. 










L 


L 


L 


L 


^OOOOOOOOOC 
i 


)OOOOOOOOOC 

,*********^ 

1 



70 80 90 100 110 120 

New Test Score 



Figure 8. Standard Error as a Function of Raw Score for the Analytic and Bootstrap Method 
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